Overview

Dataset statistics

Number of variables12
Number of observations2969
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory278.5 KiB
Average record size in memory96.0 B

Variable types

NUM12

Warnings

qty_items is highly correlated with gross_revenueHigh correlation
gross_revenue is highly correlated with qty_itemsHigh correlation
qtd_returns is highly correlated with avg_ticket and 1 other fieldsHigh correlation
avg_ticket is highly correlated with qtd_returns and 1 other fieldsHigh correlation
avg_basket_size is highly correlated with avg_ticket and 1 other fieldsHigh correlation
avg_ticket is highly skewed (γ1 = 53.44422362) Skewed
frequency is highly skewed (γ1 = 24.88049136) Skewed
qtd_returns is highly skewed (γ1 = 51.79774426) Skewed
avg_basket_size is highly skewed (γ1 = 44.67271661) Skewed
customer_id has unique values Unique
recency_days has 34 (1.1%) zeros Zeros
qtd_returns has 1481 (49.9%) zeros Zeros

Reproduction

Analysis started2022-09-19 14:20:34.705896
Analysis finished2022-09-19 14:21:02.588652
Duration27.88 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15270.77299
Minimum12347
Maximum18287
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:02.796534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12619.4
Q113799
median15221
Q316768
95-th percentile17964.6
Maximum18287
Range5940
Interquartile range (IQR)2969

Descriptive statistics

Standard deviation1718.990292
Coefficient of variation (CV)0.1125673398
Kurtosis-1.206094692
Mean15270.77299
Median Absolute Deviation (MAD)1488
Skewness0.03160785866
Sum45338925
Variance2954927.624
MonotocityNot monotonic
2022-09-19T11:21:02.979429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
143351< 0.1%
 
156791< 0.1%
 
156891< 0.1%
 
177361< 0.1%
 
156871< 0.1%
 
177341< 0.1%
 
136361< 0.1%
 
127221< 0.1%
 
136341< 0.1%
 
156811< 0.1%
 
Other values (2959)295999.7%
 
ValueCountFrequency (%) 
123471< 0.1%
 
123481< 0.1%
 
123521< 0.1%
 
123561< 0.1%
 
123581< 0.1%
 
ValueCountFrequency (%) 
182871< 0.1%
 
182831< 0.1%
 
182821< 0.1%
 
182771< 0.1%
 
182761< 0.1%
 

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION

Distinct2949
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2749.321711
Minimum6.2
Maximum279138.02
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:03.108354image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum6.2
5-th percentile229.77
Q1570.96
median1086.92
Q32308.06
95-th percentile7219.68
Maximum279138.02
Range279131.82
Interquartile range (IQR)1737.1

Descriptive statistics

Standard deviation10580.62331
Coefficient of variation (CV)3.848448607
Kurtosis353.944724
Mean2749.321711
Median Absolute Deviation (MAD)672.16
Skewness16.77755612
Sum8162736.16
Variance111949589.6
MonotocityNot monotonic
2022-09-19T11:21:03.250288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1353.7420.1%
 
178.9620.1%
 
52.220.1%
 
533.3320.1%
 
1066.1520.1%
 
379.6520.1%
 
731.920.1%
 
901.220.1%
 
1078.9620.1%
 
2053.0220.1%
 
Other values (2939)294999.3%
 
ValueCountFrequency (%) 
6.21< 0.1%
 
13.31< 0.1%
 
151< 0.1%
 
36.561< 0.1%
 
451< 0.1%
 
ValueCountFrequency (%) 
279138.021< 0.1%
 
259657.31< 0.1%
 
194550.791< 0.1%
 
168472.51< 0.1%
 
140450.721< 0.1%
 

recency_days
Real number (ℝ≥0)

ZEROS

Distinct272
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.28763894
Minimum0
Maximum373
Zeros34
Zeros (%)1.1%
Memory size23.2 KiB
2022-09-19T11:21:03.395200image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q111
median31
Q381
95-th percentile242
Maximum373
Range373
Interquartile range (IQR)70

Descriptive statistics

Standard deviation77.75677911
Coefficient of variation (CV)1.209513686
Kurtosis2.777962659
Mean64.28763894
Median Absolute Deviation (MAD)26
Skewness1.798379538
Sum190870
Variance6046.116697
MonotocityNot monotonic
2022-09-19T11:21:03.520118image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1993.3%
 
4872.9%
 
3852.9%
 
2852.9%
 
8762.6%
 
10672.3%
 
9662.2%
 
7662.2%
 
17642.2%
 
16551.9%
 
Other values (262)221974.7%
 
ValueCountFrequency (%) 
0341.1%
 
1993.3%
 
2852.9%
 
3852.9%
 
4872.9%
 
ValueCountFrequency (%) 
37320.1%
 
37240.1%
 
3711< 0.1%
 
3681< 0.1%
 
36640.1%
 

qty_invoices
Real number (ℝ≥0)

Distinct56
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.723139104
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:03.676234image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile17
Maximum206
Range205
Interquartile range (IQR)4

Descriptive statistics

Standard deviation8.85653132
Coefficient of variation (CV)1.547495379
Kurtosis190.8344494
Mean5.723139104
Median Absolute Deviation (MAD)2
Skewness10.76680458
Sum16992
Variance78.43814702
MonotocityNot monotonic
2022-09-19T11:21:03.805178image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
278526.4%
 
349916.8%
 
439313.2%
 
52378.0%
 
11906.4%
 
61735.8%
 
71384.6%
 
8983.3%
 
9692.3%
 
10551.9%
 
Other values (46)33211.2%
 
ValueCountFrequency (%) 
11906.4%
 
278526.4%
 
349916.8%
 
439313.2%
 
52378.0%
 
ValueCountFrequency (%) 
2061< 0.1%
 
1991< 0.1%
 
1241< 0.1%
 
971< 0.1%
 
9120.1%
 

qty_items
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1671
Distinct (%)56.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1608.852476
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:03.958072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile102.4
Q1296
median641
Q31401
95-th percentile4407.4
Maximum196844
Range196843
Interquartile range (IQR)1105

Descriptive statistics

Standard deviation5887.578045
Coefficient of variation (CV)3.659489067
Kurtosis465.998084
Mean1608.852476
Median Absolute Deviation (MAD)422
Skewness17.85859125
Sum4776683
Variance34663575.24
MonotocityNot monotonic
2022-09-19T11:21:04.086009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
310110.4%
 
8890.3%
 
15090.3%
 
28880.3%
 
27280.3%
 
24680.3%
 
26080.3%
 
8480.3%
 
30070.2%
 
11470.2%
 
Other values (1661)288697.2%
 
ValueCountFrequency (%) 
11< 0.1%
 
220.1%
 
1220.1%
 
161< 0.1%
 
171< 0.1%
 
ValueCountFrequency (%) 
1968441< 0.1%
 
809971< 0.1%
 
802631< 0.1%
 
773731< 0.1%
 
699931< 0.1%
 

qty_products
Real number (ℝ≥0)

Distinct468
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean122.7241495
Minimum1
Maximum7838
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:04.243908image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9
Q129
median67
Q3135
95-th percentile382
Maximum7838
Range7837
Interquartile range (IQR)106

Descriptive statistics

Standard deviation269.8964081
Coefficient of variation (CV)2.199211884
Kurtosis354.8611303
Mean122.7241495
Median Absolute Deviation (MAD)44
Skewness15.70763473
Sum364368
Variance72844.07112
MonotocityNot monotonic
2022-09-19T11:21:04.388826image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
28431.4%
 
20371.2%
 
35351.2%
 
29351.2%
 
19341.1%
 
15331.1%
 
11321.1%
 
26311.0%
 
27301.0%
 
25301.0%
 
Other values (458)262988.5%
 
ValueCountFrequency (%) 
160.2%
 
2140.5%
 
3160.5%
 
4170.6%
 
5260.9%
 
ValueCountFrequency (%) 
78381< 0.1%
 
56731< 0.1%
 
50951< 0.1%
 
45801< 0.1%
 
26981< 0.1%
 

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct2965
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.89776151
Minimum2.150588235
Maximum56157.5
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:04.517727image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2.150588235
5-th percentile4.916661099
Q113.11933333
median17.95658654
Q324.98828571
95-th percentile90.497
Maximum56157.5
Range56155.34941
Interquartile range (IQR)11.86895238

Descriptive statistics

Standard deviation1036.934407
Coefficient of variation (CV)19.98033011
Kurtosis2890.707126
Mean51.89776151
Median Absolute Deviation (MAD)5.984842033
Skewness53.44422362
Sum154084.4539
Variance1075232.964
MonotocityNot monotonic
2022-09-19T11:21:04.655669image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1520.1%
 
15.2920.1%
 
14.4783333320.1%
 
4.16220.1%
 
17.569444441< 0.1%
 
29.934776121< 0.1%
 
21.526981131< 0.1%
 
8.5815322581< 0.1%
 
81.61< 0.1%
 
33.9251< 0.1%
 
Other values (2955)295599.5%
 
ValueCountFrequency (%) 
2.1505882351< 0.1%
 
2.43251< 0.1%
 
2.4623711341< 0.1%
 
2.5112413791< 0.1%
 
2.5153333331< 0.1%
 
ValueCountFrequency (%) 
56157.51< 0.1%
 
4453.431< 0.1%
 
3202.921< 0.1%
 
1687.21< 0.1%
 
952.98751< 0.1%
 

avg_recency_days
Real number (ℝ≥0)

Distinct1258
Distinct (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.34851138
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:04.805568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q125.92307692
median48.28571429
Q385.33333333
95-th percentile201
Maximum366
Range365
Interquartile range (IQR)59.41025641

Descriptive statistics

Standard deviation63.54492876
Coefficient of variation (CV)0.9435238799
Kurtosis4.887109087
Mean67.34851138
Median Absolute Deviation (MAD)26.28571429
Skewness2.062770925
Sum199957.7303
Variance4037.957972
MonotocityNot monotonic
2022-09-19T11:21:04.953837image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14250.8%
 
4220.7%
 
70210.7%
 
7200.7%
 
35190.6%
 
49180.6%
 
21170.6%
 
11170.6%
 
46170.6%
 
42160.5%
 
Other values (1248)277793.5%
 
ValueCountFrequency (%) 
1160.5%
 
1.51< 0.1%
 
2130.4%
 
2.51< 0.1%
 
2.6013986011< 0.1%
 
ValueCountFrequency (%) 
3661< 0.1%
 
3651< 0.1%
 
3631< 0.1%
 
3621< 0.1%
 
35720.1%
 

frequency
Real number (ℝ≥0)

SKEWED

Distinct1225
Distinct (%)41.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1137973039
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:05.083805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.008894164194
Q10.01633986928
median0.02588996764
Q30.04945054945
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.03311068017

Descriptive statistics

Standard deviation0.4081562524
Coefficient of variation (CV)3.586695275
Kurtosis989.3650758
Mean0.1137973039
Median Absolute Deviation (MAD)0.0121913375
Skewness24.88049136
Sum337.8641954
Variance0.1665915263
MonotocityNot monotonic
2022-09-19T11:21:05.231627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
11986.7%
 
0.0625180.6%
 
0.02777777778170.6%
 
0.02380952381160.5%
 
0.09090909091150.5%
 
0.08333333333150.5%
 
0.02941176471140.5%
 
0.03448275862140.5%
 
0.01923076923130.4%
 
0.07692307692130.4%
 
Other values (1215)263688.8%
 
ValueCountFrequency (%) 
0.0054495912811< 0.1%
 
0.0054644808741< 0.1%
 
0.0054794520551< 0.1%
 
0.0054945054951< 0.1%
 
0.00558659217920.1%
 
ValueCountFrequency (%) 
171< 0.1%
 
31< 0.1%
 
260.2%
 
1.1428571431< 0.1%
 
11986.7%
 

qtd_returns
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct214
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.1569552
Minimum0
Maximum80995
Zeros1481
Zeros (%)49.9%
Memory size23.2 KiB
2022-09-19T11:21:05.363552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q39
95-th percentile100.6
Maximum80995
Range80995
Interquartile range (IQR)9

Descriptive statistics

Standard deviation1512.496135
Coefficient of variation (CV)24.33349783
Kurtosis2765.52864
Mean62.1569552
Median Absolute Deviation (MAD)1
Skewness51.79774426
Sum184544
Variance2287644.557
MonotocityNot monotonic
2022-09-19T11:21:05.518400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0148149.9%
 
11645.5%
 
21485.0%
 
31053.5%
 
4893.0%
 
6782.6%
 
5612.1%
 
12511.7%
 
7431.4%
 
8431.4%
 
Other values (204)70623.8%
 
ValueCountFrequency (%) 
0148149.9%
 
11645.5%
 
21485.0%
 
31053.5%
 
4893.0%
 
ValueCountFrequency (%) 
809951< 0.1%
 
90141< 0.1%
 
80041< 0.1%
 
44271< 0.1%
 
37681< 0.1%
 

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1979
Distinct (%)66.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean249.8137641
Minimum1
Maximum40498.5
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:05.754265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile44
Q1103.25
median172.3333333
Q3281.6923077
95-th percentile600
Maximum40498.5
Range40497.5
Interquartile range (IQR)178.4423077

Descriptive statistics

Standard deviation791.5551894
Coefficient of variation (CV)3.168581172
Kurtosis2255.538236
Mean249.8137641
Median Absolute Deviation (MAD)83.08333333
Skewness44.67271661
Sum741697.0657
Variance626559.6179
MonotocityNot monotonic
2022-09-19T11:21:05.940158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100110.4%
 
114100.3%
 
7390.3%
 
8290.3%
 
8690.3%
 
13680.3%
 
6080.3%
 
8880.3%
 
7580.3%
 
12970.2%
 
Other values (1969)288297.1%
 
ValueCountFrequency (%) 
120.1%
 
21< 0.1%
 
3.3333333331< 0.1%
 
5.3333333331< 0.1%
 
5.6666666671< 0.1%
 
ValueCountFrequency (%) 
40498.51< 0.1%
 
6009.3333331< 0.1%
 
42821< 0.1%
 
39061< 0.1%
 
3868.651< 0.1%
 

avg_unique_basket_size
Real number (ℝ≥0)

Distinct906
Distinct (%)30.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.48459137
Minimum0.2
Maximum259
Zeros0
Zeros (%)0.0%
Memory size23.2 KiB
2022-09-19T11:21:06.089073image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile2
Q17.666666667
median13.6
Q322.14285714
95-th percentile46
Maximum259
Range258.8
Interquartile range (IQR)14.47619048

Descriptive statistics

Standard deviation15.46030748
Coefficient of variation (CV)0.8842246955
Kurtosis29.31744084
Mean17.48459137
Median Absolute Deviation (MAD)6.6
Skewness3.43586152
Sum51911.75179
Variance239.0211074
MonotocityNot monotonic
2022-09-19T11:21:06.266971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
13421.4%
 
9411.4%
 
8391.3%
 
16391.3%
 
14381.3%
 
17381.3%
 
11361.2%
 
5361.2%
 
7361.2%
 
15351.2%
 
Other values (896)258987.2%
 
ValueCountFrequency (%) 
0.21< 0.1%
 
0.2530.1%
 
0.333333333360.2%
 
0.41< 0.1%
 
0.40909090911< 0.1%
 
ValueCountFrequency (%) 
2591< 0.1%
 
1771< 0.1%
 
1481< 0.1%
 
1271< 0.1%
 
1051< 0.1%
 

Interactions

2022-09-19T11:20:41.969451image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:42.179346image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:42.313253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:42.452173image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:42.592093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:42.713024image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:42.851944image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:42.962890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:43.096804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:43.211737image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:43.352666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:43.473603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:43.725445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:43.865364image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:43.979299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:44.120217image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:44.367076image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:44.538983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:44.678897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:44.825814image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:44.946759image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.093660image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.221586image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.358507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.467445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.605382image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.721309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.853224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:45.963170image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:46.105089image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:46.227009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:46.355935image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:46.476866image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:46.615787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:46.741714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:46.884633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:47.002565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:47.125504image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:47.235441image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:47.369370image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:47.503278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:47.646196image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:47.853080image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:48.015984image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:48.148908image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:48.257845image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:48.386782image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:48.502715image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:48.625651image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:48.917467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:49.042396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:49.216295image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:49.347221image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:49.485142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:49.640052image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:49.766980image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:49.916894image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:50.049818image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:50.208727image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:50.357642image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:50.486568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:50.631488image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:50.752425image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:50.891335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.009278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.153186image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.279113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.417041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.540963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.696873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.854276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:51.982211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.121122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.229060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.358986image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.468923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.573341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.708264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.825206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:52.954133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.065059image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.195999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.311927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.454186image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.562125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.709040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.824277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:53.965197image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:54.077123image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:54.221041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:54.362972image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:54.478895image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:54.811754image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:54.954800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:55.078738image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:55.229634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:55.395539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:55.513500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:55.646440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:55.767383image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:55.897308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.028720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.165792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.281742image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.418970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.538920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.678822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.801766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:56.936687image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.059639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.194586image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.319039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.455069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.584011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.731936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.871925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:57.997753image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:58.138693image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:58.268632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:58.420730image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:58.542660image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:58.682580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:58.805524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:58.952808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:59.069741image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:59.215657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:59.343596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:59.505493image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:59.652897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:59.778809image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:20:59.925180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.059103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.196024image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.304961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.443962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.557905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.682824image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.811750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:00.966660image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:01.084997image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:01.220351image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:01.335287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:01.506188image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:01.649105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-09-19T11:21:06.426879image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-19T11:21:06.696725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-19T11:21:06.927592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-19T11:21:07.128477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-19T11:21:02.156039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-19T11:21:02.431093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

customer_idgross_revenuerecency_daysqty_invoicesqty_itemsqty_productsavg_ticketavg_recency_daysfrequencyqtd_returnsavg_basket_sizeavg_unique_basket_size
0178505391.21372.034.01733.0297.018.15222235.50000017.00000040.050.9705880.617647
1130473232.5956.09.01390.0171.018.90403527.2500000.02830235.0154.44444411.666667
2125836705.382.015.05028.0232.028.90250023.1875000.04032350.0335.2000007.600000
313748948.2595.05.0439.028.033.86607192.6666670.0179210.087.8000004.800000
415100876.00333.03.080.03.0292.0000008.6000000.07317122.026.6666670.333333
5152914623.3025.014.02102.0102.045.32647123.2000000.04011529.0150.1428574.357143
6146885630.877.021.03621.0327.017.21978618.3000000.057221399.0172.4285717.047619
7178095411.9116.012.02057.061.088.71983635.7000000.03352041.0171.4166673.833333
81531160767.900.091.038194.02379.025.5434644.1444440.243316474.0419.7142866.230769
9160982005.6387.07.0613.067.029.93477647.6666670.0243900.087.5714294.857143

Last rows

customer_idgross_revenuerecency_daysqty_invoicesqty_itemsqty_productsavg_ticketavg_recency_daysfrequencyqtd_returnsavg_basket_sizeavg_unique_basket_size
2959177271060.2515.01.0645.066.016.0643946.01.0000006.0645.00000066.000000
296017232421.522.02.0203.036.011.70888912.00.1538460.0101.50000015.000000
296117468137.0010.02.0116.05.027.4000004.00.4000000.058.0000002.500000
296213596697.045.02.0406.0166.04.1990367.00.2500000.0203.00000066.500000
2963148931237.859.02.0799.073.016.9568492.00.6666670.0399.50000036.000000
296412479473.2011.01.0382.030.015.7733334.01.00000034.0382.00000030.000000
296514126706.137.03.0508.015.047.0753333.00.75000050.0169.3333334.666667
2966135211092.391.03.0733.0435.02.5112414.50.3000000.0244.333333104.000000
296715060301.848.04.0262.0120.02.5153331.02.0000000.065.50000020.000000
296812558269.967.01.0196.011.024.5418186.01.000000196.0196.00000011.000000